Strange process behavior

Strange process behavior

am 20.09.2010 17:52:50 von Sharon Pattison

--000e0cd146c6a0cdd80490b2e764
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

Hi,


We are experiencing a strange problem on our web servers for our site, and
are not sure if the problem is mod_perl related, but thought we would turn
here for help in case someone else has experienced this issue.
Unfortunately, we are not sure exactly when the problem started, so we can=
=92t
point to any particular change that might have prompted it.



We are using Apache 1.3.42 with mod_perl 1.31 and Perl 5.8.3 on Solaris 9.
What is happening is that a particular Apache child process starts having a=
n
issue where the wrong content is being returned for a request. The content
returned is always from the correct script, but it has the wrong data. For
example, if we request a product detail page for a particular product, we
might get a different product, or we might get a message that the product
does not exist, when it does.



After adding a lot of logging to our logs, we discovered that the Apache
request object will have the correct information, but the CGI object never
seems to get the global values reset, so is always has the query_string of
the request where things started to go wrong. The process will continue to
get and respond to requests, but the CGI object=92s query_string never
changes, so the user either gets no information or the wrong information.



For example, if we had a request for mysite.com/product?sku=3D12345, and th=
e
process that served this request starts to have the problem, then we see
that for every request that process gets, logging $q->query_string gives us
=91sku=3D12345=92. It seems like the initialize_globals() CGI method is ne=
ver
being executed in the process cleanup.



There doesn=92t seem to be any particular request that triggers this
behavior. We have checked the server load when we discover the issue is
occurring and everything seems normal. The problem process does not seem t=
o
be using any more memory or CPU than any of the other processes, and the
number of processes is fine.



Has anyone experienced anything like this? Any ideas how we can track down
the problem?



Thank you in advance for any help you can give us.



Sharon

--000e0cd146c6a0cdd80490b2e764
Content-Type: text/html; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

Hi,

=A0


face=3D"Calibri">We are experiencing a strange problem on our web servers =
for our site, and are not sure if the problem is mod_perl related, but thou=
ght we would turn here for help in case someone else has experienced this i=
ssue.=A0 Unfortunately, we are not sure exactly when the problem started, s=
o we can=92t point to any particular change that might have prompted it. ont>



=3D"Calibri">=A0


face=3D"Calibri">We are using Apache 1.3.42 with mod_perl 1.31 and Perl 5.=
8.3 on Solaris 9.=A0 What is happening is that a particular Apache child pr=
ocess starts having an issue where the wrong content is being returned for =
a request.=A0 The content returned is always from the correct script, but i=
t has the wrong data.=A0 For example, if we request a product detail page f=
or a particular product, we might get a different product, or we might get =
a message that the product does not exist, when it does.



=3D"Calibri">=A0


face=3D"Calibri">After adding a lot of logging to our logs, we discovered =
that the Apache request object will have the correct information, but the C=
GI object never seems to get the global values reset, so is always has the =
query_string of the request where things started to go wrong.=A0 The proces=
s will continue to get and respond to requests, but the CGI object=92s quer=
y_string never changes, so the user either gets no information or the wrong=
information.



=3D"Calibri">=A0


face=3D"Calibri">For example, if we had a request for site.com/product?sku=3D12345">mysite.com/product?sku=3D12345, and the p=
rocess that served this request starts to have the problem, then we see tha=
t for every request that process gets, logging $q->query_string gives us=
=91sku=3D12345=92.=A0 It seems like the initialize_globals() CGI method is=
never being executed in the process cleanup.



=3D"Calibri">=A0


face=3D"Calibri">There doesn=92t seem to be any particular request that tr=
iggers this behavior.=A0 We have checked the server load when we discover t=
he issue is occurring and everything seems normal.=A0 The problem process d=
oes not seem to be using any more memory or CPU than any of the other proce=
sses, and the number of processes is fine.



=3D"Calibri">=A0


face=3D"Calibri">Has anyone experienced anything like this?=A0 Any ideas h=
ow we can track down the problem?


face=3D"Calibri">=A0


face=3D"Calibri">Thank you in advance for any help you can give us.=


=3D"Calibri">=A0


=3D"Calibri">Sharon



--000e0cd146c6a0cdd80490b2e764--

Re: Strange process behavior

am 20.09.2010 19:11:35 von Perrin Harkins

Hi,

> After adding a lot of logging to our logs, we discovered that the Apache
> request object will have the correct information, but the CGI object neve=
r
> seems to get the global values reset, so is always has the query_string o=
f
> the request where things started to go wrong.=A0 The process will continu=
e to
> get and respond to requests, but the CGI object=92s query_string never
> changes, so the user either gets no information or the wrong information.

This can happen when you do an internal redirect, which does not
trigger a cleanup.

Some options:
- Use something else instead of CGI.pm. Apache::Request is one possibility=
..
- Put in a call to initialize_globals() in an early phase, like
FixupHandler, so you know it has been run.

If the process seems to become broken and always serve the wrong pages
after a certain point, you may have something else wrong, like a
variable scoping issue.

- Perrin

Re: Strange process behavior

am 20.09.2010 22:23:09 von Perrin Harkins

[ please reply-all to keep it on the list ]

On Mon, Sep 20, 2010 at 4:07 PM, Sharon Pattison wrote=
:
> We would still like to track down the cause of the problem, if we can.=A0=
The
> process does indeed seem to become broken at a particular point and the C=
GI
> object has an incorrect query_string for each request until the process
> dies.

That sounds like it may be a problem with the scoping of the variables
you load data into from CGI rather than CGI itself. Hard to say until
you get a way to reproduce it.

> If you have any other suggestions for finding the root of the problem, we
> would love to hear your ideas.

If you can write a test to determine if this is happening, you can log
the process ID and then use that to look back through your access_log
(if you log process ID there) and see what the sequence was that broke
the process. Try to write a test that runs at some point in the
request and checks the URL against the values you see from CGI.pm.

- Perrin

Re: Strange process behavior

am 21.09.2010 14:28:47 von Tuomo Salo

Are you getting "variable will not stay shared" messages in your error_log?

If you are, you might be using a lexical variable outside of the function
it was defined in. When this happens, you'll get a closure, which is a
useful tool sometimes, but for us regular people it means that old data
will unexpectedly stay in the variables from one request to the other.

The fix is to make sure that every variable (that is not meant to be global)
is declared as "my" inside any sub where it is used.

http://modperlbook.org/html/22-2-1-Value-of-x-will-not-stay- shared-at-line-5.html

-Tuomo

On Mon, Sep 20, 2010 at 11:52:50AM -0400, Sharon Pattison wrote:
> Hi,
>
>
> We are experiencing a strange problem on our web servers for our site, and
> are not sure if the problem is mod_perl related, but thought we would turn
> here for help in case someone else has experienced this issue.
> Unfortunately, we are not sure exactly when the problem started, so we can’t
> point to any particular change that might have prompted it.
>
>
>
> We are using Apache 1.3.42 with mod_perl 1.31 and Perl 5.8.3 on Solaris 9.
> What is happening is that a particular Apache child process starts having an
> issue where the wrong content is being returned for a request. The content
> returned is always from the correct script, but it has the wrong data. For
> example, if we request a product detail page for a particular product, we
> might get a different product, or we might get a message that the product
> does not exist, when it does.
>
>
>
> After adding a lot of logging to our logs, we discovered that the Apache
> request object will have the correct information, but the CGI object never
> seems to get the global values reset, so is always has the query_string of
> the request where things started to go wrong. The process will continue to
> get and respond to requests, but the CGI object’s query_string never
> changes, so the user either gets no information or the wrong information.
>
>
>
> For example, if we had a request for mysite.com/product?sku=12345, and the
> process that served this request starts to have the problem, then we see
> that for every request that process gets, logging $q->query_string gives us
> ‘sku=12345’. It seems like the initialize_globals() CGI method is never
> being executed in the process cleanup.
>
>
>
> There doesn’t seem to be any particular request that triggers this
> behavior. We have checked the server load when we discover the issue is
> occurring and everything seems normal. The problem process does not seem to
> be using any more memory or CPU than any of the other processes, and the
> number of processes is fine.
>
>
>
> Has anyone experienced anything like this? Any ideas how we can track down
> the problem?
>
>
>
> Thank you in advance for any help you can give us.
>
>
>
> Sharon

Re: Strange process behavior

am 21.09.2010 18:09:57 von Sharon Pattison

--000e0cd32b8aa4d5680490c74223
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

Hi,

I think I sent my last reply just to Tuomo, so am sending again to the
list......

Thanks for the suggestion. I checked our error logs and did not find that
message anywhere.

We added some logging to several of our scripts to log when $r->args and
$q->query_string do not match. What we find for the bad process is that $r
always has the correct data for the new request, but the value of
$q->query_string never changes once the problem starts. Here's a shortened
example of what we are logging:

[Sat Sep 18 01:14:06 2010] [error] INFO LOGGER: Mismatched query strings: =
r
=3D item_code=3DWW&item_no=3D293845&view=3Ddetails, q =3D sku=3D828752&slid=
e=3D2||PID:
2309
[Sat Sep 18 01:14:10 2010] [error] INFO LOGGER: Mismatched query strings: =
r
=3D sp=3D102686, q =3D sku=3D828752&slide=3D2||PID: 2309
[Sat Sep 18 01:14:12 2010] [error] INFO LOGGER: Mismatched query strings: =
r
=3D , q =3D sku=3D828752&slide=3D2||PID: 2309
[Sat Sep 18 01:14:20 2010] [error] INFO LOGGER: Mismatched query strings: =
r
=3D N=3D1108281+5401+10006, q =3D sku=3D828752&slide=3D2||PID: 2309
[Sat Sep 18 01:18:53 2010] [error] INFO LOGGER: Mismatched query strings: =
r
=3D item_no=3D312108, q =3D sku=3D828752&slide=3D2||PID: 2309

This will continue until the process with PID 2309 dies, and it happens wit=
h
any script that is requested.

We are doing some internal redirects as Perrin mentioned, but we are not
redirecting every time we see this happen, so I can't say that redirects ar=
e
causing the problem. At the same time this is happening, all other child
processes are behaving normally.

Sharon

On Tue, Sep 21, 2010 at 8:28 AM, Tuomo Salo wrote=
:

> Are you getting "variable will not stay shared" messages in your error_lo=
g?
>
> If you are, you might be using a lexical variable outside of the function
> it was defined in. When this happens, you'll get a closure, which is a
> useful tool sometimes, but for us regular people it means that old data
> will unexpectedly stay in the variables from one request to the other.
>
> The fix is to make sure that every variable (that is not meant to be
> global)
> is declared as "my" inside any sub where it is used.
>
>
> http://modperlbook.org/html/22-2-1-Value-of-x-will-not-stay- shared-at-lin=
e-5.html
>
> -Tuomo
>
> On Mon, Sep 20, 2010 at 11:52:50AM -0400, Sharon Pattison wrote:
> > Hi,
> >
> >
> > We are experiencing a strange problem on our web servers for our site,
> and
> > are not sure if the problem is mod_perl related, but thought we would
> turn
> > here for help in case someone else has experienced this issue.
> > Unfortunately, we are not sure exactly when the problem started, so we
> can=92t
> > point to any particular change that might have prompted it.
> >
> >
> >
> > We are using Apache 1.3.42 with mod_perl 1.31 and Perl 5.8.3 on Solaris
> 9.
> > What is happening is that a particular Apache child process starts havi=
ng
> an
> > issue where the wrong content is being returned for a request. The
> content
> > returned is always from the correct script, but it has the wrong data.
> For
> > example, if we request a product detail page for a particular product, =
we
> > might get a different product, or we might get a message that the produ=
ct
> > does not exist, when it does.
> >
> >
> >
> > After adding a lot of logging to our logs, we discovered that the Apach=
e
> > request object will have the correct information, but the CGI object
> never
> > seems to get the global values reset, so is always has the query_string
> of
> > the request where things started to go wrong. The process will continu=
e
> to
> > get and respond to requests, but the CGI object=92s query_string never
> > changes, so the user either gets no information or the wrong informatio=
n.
> >
> >
> >
> > For example, if we had a request for mysite.com/product?sku=3D12345, an=
d
> the
> > process that served this request starts to have the problem, then we se=
e
> > that for every request that process gets, logging $q->query_string give=
s
> us
> > =91sku=3D12345=92. It seems like the initialize_globals() CGI method i=
s never
> > being executed in the process cleanup.
> >
> >
> >
> > There doesn=92t seem to be any particular request that triggers this
> > behavior. We have checked the server load when we discover the issue i=
s
> > occurring and everything seems normal. The problem process does not se=
em
> to
> > be using any more memory or CPU than any of the other processes, and th=
e
> > number of processes is fine.
> >
> >
> >
> > Has anyone experienced anything like this? Any ideas how we can track
> down
> > the problem?
> >
> >
> >
> > Thank you in advance for any help you can give us.
> >
> >
> >
> > Sharon
>

--000e0cd32b8aa4d5680490c74223
Content-Type: text/html; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

Hi,

=A0

I think I sent my last reply just to Tuomo, so am sending again to the=
list......

=A0

Thanks for the suggestion.=A0 I checked our error logs and did not fin=
d that message anywhere.
=A0
We added some logging to several of our =
scripts to log when $r->args and $q->query_string do not match.=A0 Wh=
at we find for the bad process is that $r always has the correct data for t=
he new request, but the value of $q->query_string never changes once the=
problem starts.=A0 Here's a shortened example of what we are logging:<=
br>
=A0
[Sat Sep 18 01:14:06 2010] [error]=A0 INFO LOGGER: Mismatched query =
strings: r =3D item_code=3DWW&item_no=3D293845&view=3Ddetails, q =
=3D sku=3D828752&slide=3D2||PID: 2309
[Sat Sep 18 01:14:10 2010] [er=
ror]=A0 INFO LOGGER: Mismatched query strings: r =3D sp=3D102686, q =3D sku=
=3D828752&slide=3D2||PID: 2309

[Sat Sep 18 01:14:12 2010] [error]=A0 INFO LOGGER: Mismatched query strings=
: r =3D , q =3D sku=3D828752&slide=3D2||PID: 2309
[Sat Sep 18 01:14:=
20 2010] [error]=A0 INFO LOGGER: Mismatched query strings: r =3D N=3D110828=
1+5401+10006, q =3D sku=3D828752&slide=3D2||PID: 2309

[Sat Sep 18 01:18:53 2010] [error]=A0 INFO LOGGER: Mismatched query strings=
: r =3D item_no=3D312108, q =3D sku=3D828752&slide=3D2||PID: 2309
=
=A0
This will continue until the process with PID 2309 dies, and it happ=
ens with any script that is requested.

=A0
We are doing some internal redirects as Perrin mentioned, but we are=
not redirecting every time we see this happen, so I can't say that red=
irects are causing the problem.=A0 At the same time this is happening, all =
other child processes are behaving normally.

=A0
Sharon


On Tue, Sep 21, 2010 at 8:28 AM, Tuomo Salo n dir=3D"ltr"><Tuomo.Salo@cyb=
ercom.com
> wrote:

; PADDING-LEFT: 1ex" class=3D"gmail_quote">Are you getting "variable w=
ill not stay shared" messages in your error_log?

If you are, yo=
u might be using a lexical variable outside of the function

it was defined in. When this happens, you'll get a closure, which is a<=
br>useful tool sometimes, but for us regular people it means that old data<=
br>will unexpectedly stay in the variables from one request to the other. r>

The fix is to make sure that every variable (that is not meant to be gl=
obal)
is declared as "my" inside any sub where it is used.
=

shared-at-line-5.html" target=3D"_blank">http://modperlbook.org/html/22-2-1=
-Value-of-x-will-not-stay-shared-at-line-5.html



=A0-Tuomo




On Mon, Sep 20, 2010 at 11:52:50AM -0400, Sharon Patt=
ison wrote:
> Hi,
>
>
> We are experiencing a stran=
ge problem on our web servers for our site, and
> are not sure if the=
problem is mod_perl related, but thought we would turn

> here for help in case someone else has experienced this issue.
>=
Unfortunately, we are not sure exactly when the problem started, so we can=
=92t
> point to any particular change that might have prompted it. >
>
>
>
> We are using Apache 1.3.42 with mod_perl 1.31 =
and Perl 5.8.3 on Solaris 9.
> What is happening is that a particular=
Apache child process starts having an
> issue where the wrong conten=
t is being returned for a request. =A0The content

> returned is always from the correct script, but it has the wrong data.=
=A0For
> example, if we request a product detail page for a particul=
ar product, we
> might get a different product, or we might get a mes=
sage that the product

> does not exist, when it does.
>
>
>
> After ad=
ding a lot of logging to our logs, we discovered that the Apache
> re=
quest object will have the correct information, but the CGI object never >
> seems to get the global values reset, so is always has the query_strin=
g of
> the request where things started to go wrong. =A0The process w=
ill continue to
> get and respond to requests, but the CGI object=92s=
query_string never

> changes, so the user either gets no information or the wrong informati=
on.
>
>
>
> For example, if we had a request for href=3D"http://mysite.com/product?sku=3D12345" target=3D"_blank">mysite.co=
m/product?sku=3D12345, and the

> process that served this request starts to have the problem, then we s=
ee
> that for every request that process gets, logging $q->query_s=
tring gives us
> =91sku=3D12345=92. =A0It seems like the initialize_g=
lobals() CGI method is never

> being executed in the process cleanup.
>
>
>
>=
There doesn=92t seem to be any particular request that triggers this
&g=
t; behavior. =A0We have checked the server load when we discover the issue =
is

> occurring and everything seems normal. =A0The problem process does not=
seem to
> be using any more memory or CPU than any of the other proc=
esses, and the
> number of processes is fine.
>
>
>=


> Has anyone experienced anything like this? =A0Any ideas how we can tra=
ck down
> the problem?
>
>
>
> Thank you in a=
dvance for any help you can give us.
>
>
>
> Sharon=





--000e0cd32b8aa4d5680490c74223--

Re: Strange process behavior

am 21.09.2010 18:39:12 von Perrin Harkins

On Tue, Sep 21, 2010 at 12:09 PM, Sharon Pattison wrot=
e:
> We added some logging to several of our scripts to log when $r->args and
> $q->query_string do not match.=A0 What we find for the bad process is tha=
t $r
> always has the correct data for the new request, but the value of
> $q->query_string never changes once the problem starts.

One quick thing to check: make sure you have the very latest version
of CGI.pm installed! This often fixes mysterious CGI.pm problems.

> Here's a shortened
> example of what we are logging:

I'd suggest you log enough that you can see all the requests sent to a
process before it broke. Then try duplicating those with an LWP
script and see if you can reproduce the breakage.

- Perrin